Abstract:Large language models (LLMs) are increasingly used in academic peer review, yet their reliability, alignment with human judgment, and robustness to adversarial attacks remain poorly understood. We present a systematic benchmark of LLM-as-a-Reviewer on 898 papers stratified from NeurIPS and ICLR, evaluating 12 LLMs along three axes: rating calibration, divergence from human reviewers, and resistance to prompt injection embedded via an invisible font-mapping attack. We find that LLMs systematically overrate weaker submissions and diverge from humans in topical emphasis, under-flagging Clarity and over-flagging Reproducibility, while producing reviews two to three times longer with lower lexical diversity and a more standardized vocabulary. Prompt injection remains highly effective. Simple hidden instructions can promote low-scoring papers to acceptance-level ratings in a substantial fraction of cases, with effectiveness varying sharply across model families. While LLMs offer utility in structuring evaluations, their integration into peer review requires safeguards against both intrinsic biases and adversarial risks.
Abstract:As robots become increasingly integrated into daily life, understanding responses to robot mistreatment carries important ethical and design implications. This mixed-methods study (N = 201) examined how anthropomorphic levels and moral foundations shape reactions to robot abuse. Participants viewed videos depicting physical mistreatment of robots varying in humanness (Spider, Twofoot, Humanoid) and completed measures assessing moral foundations, anger, and social distance. Results revealed that anthropomorphism determines whether people extend moral consideration to robots, while moral foundations shape how they reason about such consideration. Qualitative analysis revealed distinct reasoning patterns: low-progressivism individuals employed character-based judgments, while high-progressivism individuals engaged in future-oriented moral deliberation. Findings offer implications for robot design and policy communication.
Abstract:Social robots like Moxie are designed to form strong emotional bonds with children, but their abrupt discontinuation can cause significant struggles and distress to children. When these services end, the resulting harm raises complex questions of who bears responsibility when children's emotional bonds are broken. Using the Moxie shutdown as a case study through a qualitative survey of 72 U.S. participants, our findings show that the responsibility is viewed as a shared duty across the robot company, parents, developers, and government. However, these attributions varied by political ideology and parental status of whether they have children. Participants' perceptions of whether the robot service should continue are highly polarized; supporters propose technical, financial, and governmental pathways for continuity, while opponents cite business realities and risks of unhealthy emotional dependency. Ultimately, this research contributes an empirically grounded shared responsibility framework for safeguarding child-robot companionship by detailing how accountability is distributed and contested, informing concrete design and policy implications to mitigate the emotional harm of robot discontinuation.
Abstract:Robots with anthropomorphic features are increasingly shaping how humans perceive and morally engage with them. Our research investigates how different levels of anthropomorphism influence protective responses to robot abuse, extending the Computers as Social Actors (CASA) and uncanny valley theories into a moral domain. In an experiment, we invite 201 participants to view videos depicting abuse toward a robot with low (Spider), moderate (Two-Foot), or high (Humanoid) anthropomorphism. To provide a comprehensive analysis, we triangulate three modalities: self-report surveys measuring emotions and uncanniness, physiological data from automated facial expression analysis, and qualitative reflections. Findings indicate that protective responses are not linear. The moderately anthropomorphic Two-Foot robot, rated highest in eeriness and "spine-tingling" sensations consistent with the uncanny valley, elicited the strongest physiological anger expressions. Self-reported anger and guilt are significantly higher for both the Two-Foot and Humanoid robots compared to the Spider. Qualitative findings further reveal that as anthropomorphism increases, moral reasoning shifts from technical assessments of property damage to condemnation of the abuser's character, while governance proposals expand from property law to calls for quasi-animal rights and broader societal responsibility. These results suggest that the uncanny valley does not dampen moral concern but paradoxically heightens protective impulses, offering critical implications for robot design, policy, and future legal frameworks.